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Abstract 

We introduce and develop a novel approach to outlier detection based on adaptation of random subspace 
learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional 
high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like minimum 
covariance determinant (MOD) by computing the needed determinants and associated measures in much lower 
dimensional subspaces. Both theoretical and computational development of our approach reveal that it is 
computationally more efficient than the regularized methods in high-dimensional low-sample size, and often 
competes favorably with existing methods as far as the percentage of correct outlier detection is concerned. 

Keywords: High-dimensional, Robust, Outlier Detection, Contamination, Large p small n, Random 
Subspace Method, Minimum Covariance Determinant 
2015 MSC: 00-01, 99-00 


1. Introduction 

We are given a dataset ^ = {xi, • • • ,x„}, where x^ = {xn, ■ ■ ■ ,XipY & ^ C under the special scenario 

in which n p, referred to as high dimensional low sample size (HDLSS) setting. It is assumed that the basic 
distribution of the X^’s is multivariate Gaussian, so that the density of X is given by (j)p{'x; /x, E), with 

^!)p(x;/r,E) = -^=2= exp |-i(x -/x)^S-bx -/r)| . (1) 

It is also further assumed that the data set ^ is contaminated, with a proportion e € (0, r) where r < e~^, of 
observations that are outliers, so that under e-contamination regime, the probability density function of X is 
given by 


p(x|/r, S, e, 7 ?, 7 ) = (1 - e)(/)p(x; /x, S) -b e(/)p(x; fJ, + r], yS), ( 2 ) 

where rj represents the contamination of the location parameter /x, while 7 captures the level of contamination of 
the scatter matrix S. Given a dataset with the above characteristics, the goal of all outlier detection techniques 
and methods is to select and isolate as many outliers as possible so as to perform robust statistical procedures 
non-aversely affected by those outliers. In such scenarios where the multivariate Gaussian is the assumed 
basic underlying distribution, the classical Mahalanobis distance is the default measure of the proximity of the 
observations, namely 

= (x* - m)^S'Ax, - pt), (3) 

and experimenters of often address and tackle the outlier detection task in such situations using either the 
so-called Minimum Covariance Determinant (MCD) Algorithm [ 1 ] or some extensions or adaptations thereof. 
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Algorithm 1 Minimum Covariance Determinant (MCD) 

1 : Select h observations, and form the dataset SIr- H C {1, • • • ,n}. 

2 : Compute the empirical covariance ’Sr and mean fiR. 

3: Compute the Mahalanobis distances d?. - (xD, z = l,---,n 
4 : Select the h observations having the smallest Mahalanobis distance. 

5: Update SIr and repeat steps 2 to 5 until det(S//) no longer decreases. 


The MCD algorithm can be formulated as an optimization problem 

^ arginin{^(/x, S,i7)} 

where 

E{n,T,,H) ^ log{det(S)} + ^ X] 

i£H 

The seminal MCD algorithm proposed by [1] turned out to be rather slow and did not scale well as a function 
of the sample size n. That limitation of MCD led its author to creation of the so-called FAST-MCD [2], focused 
on solving the outlier detection problem in a more computationally efficient way. Since the algorithm only 
needs to select a limited number h of observations for each loop, its complexity can be reduced when sample 
size n is large, since only a small fraction of the data is used. It must be noted however that the bulk of the 
computations in MCD has to do with the estimation of determinants and the Mahalanobis distances, both 
requiring a complexity of 0{p^) where p is the dimensionality of the input space as defined earlier. It becomes 
crucial therefore to find out how MCD fares when n is large and p is also large, even the now quite ubiquitous 
scenario where n is small but p is very larger, and indeed much larger than n. This p larger than n scenario, 
referred to as high dimension low sample size (HDLSS) is very common nowadays in application domains such 
as gene expression datasets from RNA-sequencing and microarray, audio processing, image processing, just to 
name a few. As noted before, with the MCD algorithm, h observations have to be selected to compute the 
robust estimator. Unfortunately, when n p, neither the inverse nor the determinant of covariance matrix 
can be computed. As we’ll show later, the O(p^) complexity of matrix inversion and determinant computatation 
renders MCD untenable for p as moderate as 500. It is therefore natural, in the presence of HDLSS datasets, to 
contemplate at least some intermediate dimensionality reduction step prior to performing the outlier detection 
task. Several algorithms have been proposed, among which PCOut by [3], Regularized MCD (R-MCD) by [4] 
and other ideas by [5] , [6] , [7] , [8] . When instability in the data makes the computation of S problematic in p 
dimension, regularized MCD may be used with objective function 

S, H, A) = £’(/x, S, H) + Atrace(S~^), 

where A is the so-called regularizer or tuning parameter, chosen to stabilize the procedure. However, it turns 
out that even the above Regularized MCD cannot be contemplated when p ^ n, since det(S) is always zero 
in such cases. The solution to that added difficulty is addressed by solving 

= argmax| log{det(S)} + ^ X! “ M) + Atrace(S~^)| (4) 

where the regularized coveriance matrix S is given by 

^ Ql ^ 

S(q;) = (1 — a)S H-trace(S)/p 

with a G (0,1). For many HDLSS datasets however, the dimensionality p of the input space is often large, with 
numbers like p > 10^ or even p > 10^ rather very common. As a result, even the above direct regularization is 
computationally intractable, because when p is large, the 0{p^) complexity of the needed matrix inversion and 
determinant calculation makes the problem computationally untenable. The fastest matrix inversion algorithms 
like [9] and [10] are theoretically around and and so complicated that there are virtually 

no useful implementation of any of them. In short, the regularization approach to MCD like algorithms is 
impractical and unusable for HDLSS datasets even for values of p around a few hundreds. Another approach to 
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outlier detection in the HDLSS context has revolved around extensions and adaptations of principle component 
analysis(PCA). Classical PCA seeks to project high dimensional vectors onto a lower dimensional orthogonal 
space while maximizing the variance. By reducing the dimensionality of the original data, one seeks to create 
a new data representation that evades the curse of dimensionality. However, PCA, in its generic form, is not 
robust, for the obvious reason that it is built by a series of transformations of means and covariance matrices 
whose generic estimators are notoriously non robust. It is therefore of interest to seek to perform PCA in a way 
that does not suffer from the presence of outliers in the data, and thereby identify the outlying observations as 
a byproduct of such a PCA. Many authors have worked on the robustification of PCA, and among them [11] 
whose proposed ROBPCA, a robust PCA method, which essentially robustihes PCA by combining MCD with 
the famous projection pursuit technique ([12], [13]). Interestingly, if instead of reducing the dimensionality based 
on robust estimators, one can first apply PCA to the whole data, then outliers may surprisingly lie on several 
directions where they are then exposed more clearly and distinctly. Such an insight appear to have motivated 
the creation of the so-called PCOut algorithm proposed by [3]. PCOut uses PCA as part of its preprocessing 
step after the original data has been scaled by Median Absolute Deviation (MAD). In fact, in PCOut, each 
attribute is transformed as follows: 


— - - — 7 = 1 • • • n 

MAD{-Kjy-’ ’ 


( 5 ) 


where Xj = {xij, - ■ ■ , Xnj ) C M" ^^ and Xj is the median of xj. Then with X* = [x*, X 2 , • • • , x*] , PCA can be 
performed, namely 

X*Tx* = VAV^ (6) 


from which the principal component scores Z = X* ■ V may then be used for the purpose of outlier detection. In 
fact, it also turns out that the principal component scores Z may be re-scaled to achieve a much lower dimension 
with 99% variance retained. Unlike MCD, PCA based re-scaled method is not only practical but also performs 
better with high dimensional datasets. 99% of simulated outliers are detected when n = 2000, p = 2000. A 
higher false positive rate is reported in low dimensional cases, and less than half of the outliers were identified 
in scenarios with n = 2000,p = 50. It is clear by now that with HDLSS datasets, some form of dimensionality 
reduction is needed prior to performing outlier detection. Unlike the authors just mentioned who all resorted 
to some extension or adaptation of principal component analysis wherein dimensionality reduction is based 
on transformational projection, we herein propose an approach where dimensionality reduction is not only 
stochastic but also selection-based rather than projection-based. The rest of this paper is organized as follows: 
in section 2 , we present a detailed description of our proposed approach, along with all the needed theoretical 
and conceptual justifications. In the interest of completeness, we close this section with the general description 
of a nonparametric machine learning kernel method for novelty detection known as the one-class support vector 
machine, which under suitable conditions is an alternative to the outlier detection approach proposed in this 
paper. Section 3 contains our extensive computational demonstrations on various scenarios. We specifically 
present the comparisons of the predictive/detection performances between our RSSL based approach and the 
PCA based methods discussed earlier. We mainly used simulated data here, with simulations seeking to assess 
the impact of various aspects of the data such as the dimensionality p of the input space, the contamination 
rate e and other aspects like the magnitude 7 of the contamination of the scatter matrix. We conclude with 
section 4, in which we provide a thorough discussion of our results along with various pointers to our current 
and future work on this rather compelling theme of outlier detection. 


2. Random Subspace Learning Approach to Outlier Detection 

2.1. Rationale for Random Subspace Learning 

We herein propose a technique that combines the concept underlying Random Subspace Learning (RSSL) by [14] 
with some of the key ideas behind minimum covariance determinant (MCD) to achieve a computational efficient, 
scalable, intuitive appealing and highly accurate outlier detection method for both HDLSS and LDHSS datasets. 
With our proposed method, the computation of the robust estimators of both location and scatter matrix can 
be achieved by tracing the optimal subspaces directly. Besides, we demonstrate via practical examples that 
our RSSL based method is computationally very efficient, specifically because it turns out that, unlike the 
other methods mentioned earlier, our method does not require the computationally expensive calculations 


3 



of determinants and Mahalanobis distances at each step. Morever, whenever such calculations are needed, 
they are all performed in very low dimensional spaces, further emphasizing the computational strength of our 
approach. The original MCD algorithm formulates the outlier detection problem as the problem of finding the 
smallest determinant of covariances computed from a sequence , fc = 1 , • • • , m of different subsets of the 
original data set Each subset contains h observations. More precisely, if ^optimal is the subset of ^ whose 
observations yield the estimated covariance matrix with the smallest (minimum) determinant out of all the m 
subsets considered, then we must have 

det{i:{%ptimai)) = min |det(S(^^^^)), det(S(^^^^)), • • • , det(S(^^™^))| , 

where m is the number of iterations needed for the MCD algorithm to converge, ^optimal is the subset of 3) that 
produces the estimated covariance matrix with the smallest determinant. The MCD estimates of the location 
vector and scatter matrix parameters are given by 

Mmcd — ^{,3optimal) and S MCD ^^i^3optimal) • 

The number h of observations in each subset is required to be ^ < h < n. It turns out that h= [(n + p + l)/2] 
reaches its highest possible breakdown value according to [15]. It is obvious that with h = [{n+p + l)/2] being 
the highest breakdown point, the requirement ^ < h < n cannot achieved in the HDLSS context, since in such 
a context p n. It is therefore intuitively appealing to contemplate a subspace of the input space 3^, and 
define/contruct such a subspace in such a way that its dimensionality d < p is also such that d < n to allow the 
seamless computation of the needed distances. 

2.2. Description Random Subspace Learning for Outlier Detection 

Random Subspace Learning in its generic form is designed for precisely this kind of procedure. In a nutshell, 
RSSL combines instance-bagging (bootstrap ie sampling observations with replacement) with attribute-bagging 
(sampling indices of attributes without replacement), to allow efficient ensemble learning in high dimensional 
spaces. Random Subspace Learning (Attribute Bagging) proceeds very much like traditional bagging, with the 
added crucial step consisting of selecting a subset of the variables from the input space for training rather than 
building each base learners using all the p original variables. 


Algorithm 2 Random Subspace Learning (RSSL): Attribute-bagging step 
1 : Randomly draw the number d < p of variables to consider 

2 : Draw without replacement the indices of d variables of the original p variables 
3: Perform learning/estimation in the d-dimensional subspace 


This attribute-bagging step is the main ingredient of our outlier detection approach in high dimensional spaces. 


Algorithm 3 Random Subspace Learning for Outlier Detection when p n 
1: procedure Random Subspace Outlier(R) 

2: for b = 1 to B do 

3: Draw with replacement {i^i \ • • • , in^} from {1, 2, • • • , n} to form the bootstrap sample 3^^"^ 

4: Draw without replacement from {1, 2, • • • ,p} a subset {j[^\ ■ ■ ■ Dd’^} of variables 

5: Drop unselected variables from 3^^'^ so that is d dimensional 

6; Build the 6th determinant of covariance det{'S{3'^'^fj) 

7: end for 

8 : Sort the ensemble |det(S(^^|)]^)), 6 = 1, • • • , 

9: Form 3* : det{3*) = argmin|det(S(^g|]^f,)), 6 = 1, • • • , 

10 : Compute fi* and S* base on 3* 

11 : We can build the robust distance by: 

6*{^) = {K-n*)^^*-^{^-n*). ( 7 ) 


12: end procedure 
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The RSSL outlier detection algorithm computes a determinant of covariance for each subsample, with each 
subsample residing in a subspace spanned by the d randomly selected variables, where d is usually selected to be 
min(|^, y^). A total of B subsets are generated, and their low dimensional covariance matrices are formed along 
with the corresponding determinants. Then the best subsample, meaning the one with the smallest covariance 
determinant is singled. It turns out that in the LDHSS context (n ^ p), our RSSL outlier detection algorithm 
always robustly yields the robust estimators p* and S* needed to compute the Mahalanobis distance for all 
the observations. Then the outliers can be selected using the typical cut-off built on classical Xp 5 %- In HDLSS 
context, in order to handle the curse of dimensionality, we need to involve a new variable selection procedure 
to adjust our framework and concurrently stabilize the detection. The modified version of our RSSL outlier 
detection algorithm in HDLSS is then given by: 


Algorithm 4 Random Subspace Learning for Outlier Detection when n p 
procedure Random Subspace Determinant CovARiANCE(i?) 

2: for b = 1 to B do 

Draw with replacement {i^i \ • • • , from {1, 2, • • • , n} to form the bootstrap sample 
4: Draw without replacement from {1, 2, • • • ,p} a subset {j[^\ ■ ■ ■ of d variables 

Drop unselected variables from so that I® ^ dimensional 
6 : Build the 6th determinant of covariance det(S(^^|^^j,)) 

end for 

8; Sort the ensemble |det(S(^^|^]^)), 6 = 1, • • • ,B^ 

Keep the k smallest samples based on elbow to form where p = 1, - ■ ■ ,k, k<B 

10: for j = 2 to d do 

Select v = j most frequent variables left in to compute det(S(^^^|j^j)) 

12: end for 

Form Si* : det(^*) = argmax|det(S(^^^^|;'^])), j = 2, • • • , d| 

14: Compute p* and S* base on S* 

We can build the robust distance by: 

dl(x) = (x-p*)^S*”i(x-p*). 


16: end procedure 


Without selecting the smallest determinant of covariance, we choose to select a certain number of subsamples 
to achieve the variable selection through a sort of voting process. The portion of the most frequently appearing 
variables are elected to build an optimal space that allow us to compute our robust estimators. The simulation 
results and other details will be discussed later. 

2.3. Justification Random Subspace Learning for Outlier Detection 

Conjecture 1. Let S be the dataset under consideration. Assume that a proportion e of the observations 
in S are outliers. If e < e~^, then will high probability, the proposed RSSL outlier deteetion algorithm will 
efficiently correctly identify a set of data that contains very few of the outliers. 

Sketch 1. Let Xi £ S be a random observation in the original dataset S. Let S^^'^ denote the bth bootstrapped 
sample from S. Let Pr[xj £ represent the proportion of observations that are in S but also present in 

S^^\ It is easy to prove Pr[xi £ = 1 — (l — In other words, i/Pr[xi ^ = Pr[0„] denotes the 

observations from ^ not present in Si^^\ we must have Pr[xi ^ = (l — = Pr[0„]. Since Pr[0„] is 

known to converge to e~^ as n goes to infinity. Therefore for each given bootstrapped sample there is a 

probability elose to e~^ that any given outlier will not corrupt the estimation of location vector and scatter matrix 
parameters. Sinee the outliers as well as all other observations have an asymptotic probability of e~^ of not 
affecting the bootstrapped estimator that we build. Therefore over a large enough re-sampling process (large B), 
there will be many bootstrapped samples with very few outliers leading to a sequence of small covariance 
determinants as desired, if e < e~^. It is therefore reasonable to deduce that by averaging this exclusion of 
outliers over many replications, robust estimators will naturally be generated by the RSSL algorithm 
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2.Jf.. Alternatives to Parametric Outlier Detection Methods 

The assumption of multivariate Gaussianity of the ’s is obviously limiting as it could happen that the data does 
not follow a Gaussian distribution. Outside of the realm where location and scatter matrix play a central role, 
other methods have been proposed, especially in the field of machine learning, and specifically with similarity 
measures known as kernels. One such method is known as One-Glass Support Vector Machine (OGSVM) 
proposed by [16] to solve the so-called novelty detection problem. It is important to emphasize right away that 
novelty detection although similar in spirit to outlier detection, can be quite different when it comes to the 
way the algorithms are trained. OGSVM approach to novelty detection is interesting to mention here because 
despite some conceptual differences from the covariance methods explored earlier, it is formidable at handling 
HDLSS data thanks to the power of kernels. Let $ : —>■ The one-class SVM novelty detection solves 


argmin 

we.F,$GK’*,peK 


2IIWII 




Subject to 

(w,$(xi)) > > 0, i = 

Using = ($(xi),$(xj)) = $(xi)T$(xj), we get 


/(xi) = sign I ^aj^(x,,Xj) - p 


so that any x^ with /(x^) < 0 is declared an outlier. The S^’s and p are determined by solving the quadratic 
programming problem formulated above The parameter v controls the proportion of outliers detected. One of 
the most common kernel is the so-called RBF kernel defined by 


Jt' (xi, Xj) = exp 


2 a2' 


OGSVM has been extensively studied and applied by many researchers among which [17], [18] and [19], and later 
enhanced by [20]. OGSVM is often applied to semi-supervised learning tasks where training focuses on all the 
positive examples (non outliers) and then the detection of anomalies is performed by searching points that fall 
geometrically outside of the estimated/learned decision boundary of the good (non outlying trained instances). 
It is a concrete and quite popular algorithm for solving one-class problems in fields like digital recognition and 
documentation categorization. However, it is crucial to note that OGSVM cannot be used with many other real 
life datasets for which outliers are not well-defined and/or for which there are no clearly identified all-positive 
training examples available such as gene expression mentioned before. 


3. Computational Demonstrations 


3.1. Setup of Computational demonstration and initial results 

In this section, we conduct a simulation study to assess the performance of our algorithm based on various 
important aspects of the data, and we also provide a comparison of the predictive/detection performance of 
our method against existing approaches. All our simulated data are generated according to the e-contaminated 
multivariate Gaussian introduced via (1) and (2). In order to assess the effect the covariance between the 
attributes, we use an AR-type covariance matrix of the following form: 


/ 1 P 
P 1 


P \ 
P 


p p Ip 

\ P . P 1 / 


[(1 P)^p + plplp ]j 


( 8 ) 


where Ip is the p-dimensional identity matrix, while Ip is p-dimensional vector of ones. For the remaining 
parameters, we consider 3 different levels of contamination e € {0.05,0.1,0.15}, namely mild contamination to 
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strong contamination. The dimensionality p will increase in low-dimensional case as {30,40,50, 60, 70} and high 
dimensional case as {1000,2000,3000,4000,5000} and the number of observations are fixed at 1500 and 100. 
We compare our algorithm to existing PCA based algorithms PCOut and PCDist, both of which are available in 
R within the package called rrcovHD. 



0 5000 10000 15000 



Determinants 


Log of Determinants 


Figure 1: (left) Histogram of the distribution of the determinants from all bootstrap samples when n = 100, p = 3000; (right) 
Histogram of log determinants for all the bootstrap samples. Our methodology later selects a portion of samples based on what we 
call here the elbow. 

As can be seen on Figure (1), the overwhelming majority of samples lead to determinants that are small as 
evidenced by the heavy right skewness with concentration around zero. This further confirms our conjecture 
that as long as e < e~^ which is a rather reasonable and easily realized assumption, we should isolate samples 
with few or no outliers. 
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Figure 2: (left) Tail of sorted determinants in high dimensional where B = 450. k can be selected before reaching the elbow; 

(right) The concave shape can be observed by computing determinants of covariance from 2 to m dimension. The cut-off u for 
variable selection is based on the decreasing sorted frequency located at the maximum of the determinants. 


Since each bootstrapped sample selected has a small chance of being affected by the outliers, we can select the 
dimensionality that maximize this benefits. In our HDLSS simulations, determinants are computed based on all 
the randomly selected subspaces, and are ruled by predominantly small values, which implies the robustness of 
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the classifier. Figure (1) patently shows the dominance of small values of determinants, which in this case are 
the determinants of all bootstrapped samples based on our simulated data. A distinguishable elbow is presented 
in Figure (2). The next crucial step lies in selecting a certain number of bootstrap samples, say fc, to build an 
optimal subspace. Since most of the determinants are close to each other, it is a non-trivial problem, which 
means that k needs to be carefully chosen to avoid going beyond the elbow. However, it is important to notice if 
k is too small then the variable selection in later steps of the algorithm will become a random pick, because there 
is no opportunity for each variable to appear in the ensemble. Here, we choose k to be the number of roughly the 
first 30% to 80% of B bootstrap samples according to their ascending order of the determinants. This choice 
is based on our empirical experimentations. It is not too difficult to infer the asymptotic normal distribution 
of the frequencies of all variables in as we can observe in Figure (2). Thus, the most frequently appearing 
variables located on the left tail can be adopted/kept to build our robust estimator. Once the selection of k is 
made, the frequencies of variables appearing in this ensemble can be obtained/computed for variable selection. 
The 2 to m most frequently appearing variables are included to compute the determinants in Figure (2). m is 
usually small, since we assume from the start that the true dimensionality of the data is indeed small. Here 
for instance, we choose 20 for the purposes of our computational demonstration. A sharp maximum indicates 
the number of dimension v from that sorted ensemble that we need to choose. Thus, with the bootstrapped 
observations having the smallest determinant with the subspace that generates the largest determinant, we can 
successfully compute Then the robust estimators can be formed by and S*. Theoretically 

then we are in a presence of a minimax formulation of our outlier detection problem, namely 

= argmax I argmin{det(cov(S(.^^*'^ 1 (9) 

By Equation , it should be understood that we need to isolated the precious subsample that achieves the 
smallest overall covariance determinant, but then concurrently identify along with the subspace iFl*) that 
yields the highest value of that covariance determinant among all the possible subspaces considered. 


3.2. Further results and Computational Comparisons 

As indicated in our introductory section, we use the Mahalanobis distance as our measure of proximity. As since 
we are operating under the assumption of multivariate normality, we use the traditional distribution quantiles 
q as our cut-off with the typical a = 10% and a = 5%. As usual, all observations with distances larger 
than Xd “ are classified as outliers. The data for simulation study are generated with € {2, 5} representing 
both easy and hard situation for RSSL algorithm to detect the outliers, and e, p as the rate of contamination. 
Throughout, we use R = 200 replications for each combination of parameters for each algorithm, and we use 
the average test error AVE as our measure of predictive/detection performance. Specifically, 


AVE(/) 


1 

R 


R 


E 


771 


( 10 ) 


where is the predicted label of the test set observation i yielded by / in the r-th replication. The loss 

function used here is the basic zero-one loss defined by: 






ir) 


)} 


1 ’iccic:') 

0 otherwise. 


( 11 ) 


It will be seen later that our proposed method produces predictive accurate outlier detection results, typically 
competing favorably against other techniques, and usually outperforming them. Firstly however, we show in 
Figure (3) the detection performance of our algorithm based on two randomly selected subspaces. The outliers 
detected by our algorithm are identified by red triangles and contained in the red contour, while the black circles 
are the normal data. 




X1 


X1 


Figure 3: (left) The outliers detected in a two dimensional subspace are marked as red triangles. Selection is based on (5*(x) > 
(right) Outliers are selected by x|=d,c«=lo%- 


The improvement of our random subspace learning algorithm in low dimensional data with p £ {30,40, 50,60, 70} 
and relative large sample size n = 1500, is demonstrated in figure (4) in comparison to PCDut and PCDist. Given 
a relatively easy task, namely with k, 77 = 5, the outliers are scattered widely and shifted far from normal, the 
RSSL with 1 — a equals 95% and 90% perform consistently very well, typically outperforming the competition. 
When the rate of contamination is increasing in this scenario, almost 100% accuracy can be achieved with 
RSSL based algorithm. When the outliers are spread more narrowly and closer to the mean with k,t] = 2, 
the predictive accuracy of our random subspace based algorithm is slightly less powerful but still very strong, 
namely with a predictive detection rate close to 96% to 99%. In high dimensional settings, namely with 
p £ (1000, 2000,3000,4000, 5000} and low sample size n = 100, RSSL is also performs reasonably well as shown 
in figure (5). With 1 — a = 95% chi-squared cut-off, when K,r] = 5, 96% to 98% of outliers can be detected 
constantly among all simulated high dimensions. Under more difficult conditions, as with k ,?7 = 2, a decent 
amount of outliers can be detected with accuracy around 92% to 96%. Based on the properties of robust PCA 
based algorithms, the situation that we define as ’’easy” for RSSL algorithms is actually ’’harder” for PCOut 
and PCDist. The principle component space is selected based on the visibility of outliers, and especially for 
PCOut, the components with nonzero robust kurtosis are assigned higher weights by the absolute value of their 
kurtosis coefficients. This method is shown to yield good performances when dealing with small shift of mean 
and scatter of the covariance matrix. However, if the outliers lied on larger 77 and k where excessive choices 
can be made then, it is more difficult for PCA to find the dimensionality to make the outliers ’’stick out”. 
Reversely, with a small values of k and 77 , the most obvious directions are emphasized by PCA but less chance 
for algorithms like RSSL to obtain the most sensible subspace to build robust estimators. So in figure (5), when 
K, 77 = 2 the accuracy reduced to around 92% but in all other high-dimensional settings the performance of 
RSSL is consistent with PCOut and identically stable. 
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Figure 4: The average error and standard deviation in low dimensional simulation with k,, 7] = b (left column) and k.,7] = 2 (right 
column). 
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Figure 5: The average error and standard deviation in high dimensional simulation with 77 = 5 (left column) and n,r} = 2 (right 
column). 
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4. Conclusion 


We have presented what we can rightfully claim to be a computational efficient, scalable, intuitive appeal¬ 
ing and highly predictively accurate outlier detection method for both HDLSS and LDHSS datasets. As an 
adaptation of both random subspace learning and minimum covariance determinant, our proposed approach 
can be readily used on vast number of real life examples where both its component building blocks have been 
successfully applied. The particular appeal of the random subspace learning aspect of our method comes in 
handy for many outlier detection tasks on high dimension low sample size datasets like DNA Microarray Gene 
Expression datasets for which the MCD approach proved to be computational untenable. As our computational 
demonstrations section above reveal, our proposed approach competes favorably with other existing methods, 
sometimes outperforming them predictively despite its straightforwardness and relatively simple implementa¬ 
tion. Specifically, our proposed method is shown to be very competitive for both low dimensional space and 
high dimensional space outlier detection and is computationally very efficient. We are currently seeking out 
interesting real life datasets on which to apply our method. We also plan to extend our method beyond settings 
where the underlying distribution is Gaussian. 
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